Picture for Ruoxi Sun

Ruoxi Sun

What's Pulling the Strings? Evaluating Integrity and Attribution in AI Training and Inference through Concept Shift

Add code
Apr 28, 2025
Viaarxiv icon

Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection

Add code
Apr 02, 2025
Viaarxiv icon

Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL

Add code
Apr 01, 2025
Viaarxiv icon

Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies

Add code
Feb 04, 2025
Viaarxiv icon

SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling

Add code
Jan 31, 2025
Figure 1 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Figure 2 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Figure 3 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Figure 4 for SETS: Leveraging Self-Verification and Self-Correction for Improved Test-Time Scaling
Viaarxiv icon

Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments

Add code
Jan 18, 2025
Figure 1 for Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
Figure 2 for Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
Figure 3 for Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
Figure 4 for Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments
Viaarxiv icon

Data-Centric Improvements for Enhancing Multi-Modal Understanding in Spoken Conversation Modeling

Add code
Dec 20, 2024
Viaarxiv icon

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Add code
Nov 12, 2024
Figure 1 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 2 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 3 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Figure 4 for Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Viaarxiv icon

AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems

Add code
Nov 09, 2024
Figure 1 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Figure 2 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Figure 3 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Figure 4 for AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems
Viaarxiv icon

Edge Unlearning is Not "on Edge"! An Adaptive Exact Unlearning System on Resource-Constrained Devices

Add code
Oct 15, 2024
Viaarxiv icon